Cancerous Tissue Classification Using Microarray Gene Expression

نویسندگان

  • Pei-Chun Chen
  • Victoria Popic
  • Yuling Liu
چکیده

In this project, we apply machine learning techniques to perform tumor vs. normal tissue classification using gene expression microarray data, which was proven to be useful for early-stage cancer diagnosis and cancer subtype identification. We compare the results of both supervised learning (k-nearest-neighbors, SVMs, boosting) and unsupervised learning (k-means clustering, hierarchical clustering) routines on three datasets: GSE3 (renal clear cell carcinoma), GSE8054 (pancreatic cancer) and SRBCT (round blue-cell tumors). In order to eliminate the non-informative genes from the data sets, we apply feature selection using the t-test, differential expression test, and the pairwise correlation coefficient between the class labels and each gene, which boosted the classification accuracy for most of the methods that we carried out. We present the misclassification error rate for hold-out (0.3), 5-fold, and leave-one-out cross validation. In agreement with prior work, we found machine learning techniques to be very efficient with this classification task. For binary classification (cancer vs. normal) the highest accuracy (close to 95% for GSE3 and more than 99% for GSE8054) was achieved with AdaBoost and a linear kernel SVM. For multi-class classification (SRBCT tumor subtypes) we achieve an accuracy of 100% with a linear kernel SVM without feature selection and 98% after reducing the feature dimension by 4 using the correlation coefficient feature selection technique.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Integration and Reduction of Microarray Gene Expressions Using an Information Theory Approach

The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we prese...

متن کامل

Evaluation of CDH1&RUNX3 Expression in Cancerous and Normal Tissue of Patients With Gastric Cancer

Introduction: Gastric cancer is a multifactorial disease and the fourth most common cancer in the world and the second cause of death from cancer. This study was designed and performed to investigate CDH1 and RUNX3 genes expression in healthy and tumor marginal tissue of people with gastric cancer. Methods: In this case-control study, 64 samples including 32 samples of gastric tumor tissue a...

متن کامل

Prediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods

Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method. Materials and Methods: In this descriptive study, the microarray ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012